Randomization Inference

Edward Vytlacil

Review: Hypothesis Testing

  • Review: Hypothesis Testing

  • Randomization Inference

  • Applying Randomization Inference to PROGRESA

Review: Hypothesis Testing

Definition: Null and Alternative Hypotheses

  • Null Hypothesis \(\mathbb{H}_0\),

    • Hypothesis we accept unless there is sufficiently strong evidence against it.
  • Alternative Hypothesis \(\mathbb{H}_1\)

    • Hypothesis we accept if we reject the null.

Review: Hypothesis Testing

Definition: Test Statistic

A test statistic \(T_n\) is a function of our sample that summarizes evidence against the null, with larger values indicating stronger evidence against the null.

Definition: p-value

The p-value is the probability under the null of observing a test statistic as large or larger as the one observed in the sample.

Review: Hypothesis Testing

Definition: Hypothesis Test

A hypothesis test is a decision rule that specifies for which values of the test statistic \(T_n\) we accept or fail to accept the null hypothesis.

Review: Hypothesis Testing

Review: Hypothesis Testing

Definition: Type I & II Errors

  • Type I Error: Reject null when null is true

  • Type II Error: Fail to reject null when null is false

Review: Hypothesis Testing

Definition: Significance Level of Test

A test has significance level \(\alpha\) if probability of Type I error when null hypothesis is true is less than or equal to \(\alpha\).

Review: Hypothesis Testing

Definition: Significance Level of Test

A test has significance level \(\alpha\) if probability of Type I error when null hypothesis is true is less than or equal to \(\alpha\).

Theorem:

Suppose \(p_n\) is a p-value for the null hypothesis \(\mathbb{H}_0\). Then the test that rejects the null whenever \(p_n \le \alpha\) has significance level \(\alpha\).

Randomization Inference

  • Review: Hypothesis Testing

  • Randomization Inference

  • Applying Randomization Inference to PROGRESA

Consider Hypothetical RCT

  • We observe \((X_i, Y_i)\) for each individual.

In this example,

  • \(\bar{Y}_1 = \frac{1+1+1}{3}=1\).
  • \(\bar{Y}_0 = \frac{1+0}{2}=\frac{1}{2}\).
\(i\) \(X_i\) \(Y_i\)
1 1 1
2 1 1
3 0 0
4 1 1
5 0 1


\(\bar{Y}_1 -\bar{Y}_0 = \frac{1}{2}\), but is difference just by random chance?

Inference

For any inference procedure:

  • Formally state null and alternative hypotheses,

  • Choose a test statistic to measure evidence against the null hypothesis,

  • Calculate distribution of test statistic under the null to construct p-values, perform hypothesis test.

Null: No Treatment Effect on Average

Definition: Weak Null

  • \(\mathbb{H}_0: \mathbb{E}[ Y_{1,i} - Y_{0,i}] =0\), vs

  • \(\mathbb{H}_1: \mathbb{E}[ Y_{1,i} - Y_{0,i}] \ne 0.\)

  • Null of no effect on average,
    versus alternative of some effect on average.

  • Commonly tested with t-test, based on asymptotic normality.

Null: No Treatment Effect on Anyone

Definition: Strong Null

  • \(\mathbb{H}_0: Y_{1,i} - Y_{0,i}=0\) for all \(i\), vs.

  • \(\mathbb{H}_1: Y_{1,i} - Y_{0,i}=0\) for at least one \(i\).

  • Null of no effect on any subject,
    versus alternative of some effect on someone.

  • Randomization inference will test strong null as we can find distribution of test statistic under sharp null exploiting known randomization procedure.

Consider Hypothetical RCT

  • We observe \((X_i, Y_i)\) for each individual.
  • What do we know about \(Y_{0,i}\), \(Y_{1,i}\) for each individual?
\(i\) \(Y_{0,i}\) \(Y_{1,i}\) \(X_i\) \(Y_i\)
1 1 1
2 1 1
3 0 0
4 1 1
5 0 1

Consider Hypothetical RCT

  • What do we know about \(Y_{0,i}\), \(Y_{1,i}\) for each individual?
  • If \(X_i=1\), then

    • \(Y_{1,i}=Y_{i}\)

    • We do not know \(Y_{0,i}\)

  • If \(X_i=0\), then

    • \(Y_{0,i}=Y_{i}\)

    • We do not know \(Y_{i}\)

\(i\) \(Y_{0,i}\) \(Y_{1,i}\) \(X_i\) \(Y_i\)
1 ? 1 1 1
2 ? 1 1 1
3 0 ? 0 0
4 ? 1 1 1
5 1 ? 0 1


However, under the sharp null, \(Y_{1,i}=Y_{0,i}\) for all \(i\), . . .

Consider Hypothetical RCT

  • Under sharp null:

    • \(Y_i = Y_{1,i}=Y_{0,i}\) for all \(i\).

    • Under sharp null hypothesis, we can fill in missing elements of the table.

\(i~~~~\) \(Y_{0,i}\) \(Y_{1,i}\) \(X_i\) \(Y_i\)
1 (1) 1 1 1
2 (1) 1 1 1
3 0 (0) 0 0
4 (1) 1 1 1
5 1 (1) 0 1

Randomization inference: exploit randomization procedure and that we know both \(Y_{0,i}\) and \(Y_{1,i}\) under the null to find distribution of test statistic under the null, and thus to find p-values and test the null.

Randomization Inference:

  • Sharp null,
    • \(\mathbb{H}_0: Y_{1,i} - Y_{0,i}=0\) for all \(i\), vs.
    • \(\mathbb{H}_1: Y_{1,i} - Y_{0,i}=0\) for some \(i\).
  • Chose test statistic
    • e.g., \(T_N = | \bar{Y}_1 - \bar{Y}_0 |\).
  • Calculate distribution of test statistic under the null to construct p-values, perform test.

Distribution of \(T_N\) Under Sharp Null

  • There are \({5 \choose 3}=10\) ways to assign treatment to \(3\) out of \(5\) subjects.

  • Under sharp null, we can calculate value of test statistic for each way to assign treatment.

  • Suppose randomization procedure puts equal probability on each possible way to assign treatment. Then we can find distribution of test statistic under sharp null.

Distribution of \(T_N\) Under Sharp Null

\(i\) \(Y_i\) \(X_1\) \(X_2\) \(X_3\) \(X_4\) \(X_5\) \(X_6\) \(X_7\) \(X_8\) \(X_9\) \(X_{10}\)
1 1 1 1 1 1 1 1 0 0 0 0
2 1 1 1 1 0 0 0 1 1 1 0
3 0 1 0 0 1 1 0 1 1 0 1
4 1 0 1 0 1 0 1 1 0 1 1
5 1 0 0 1 0 1 1 0 1 1 1
\(T_N=\) 1/3 1/2 1/2 1/3 1/3 1/2 1/3 1/3 1/2 1/3

Under sharp null, we know value of \(T_N\) for each possible treatment assignment. Using that randomization procedure makes each possible treatment assignment equally likely, we thus know distribution of \(T_N\) under the sharp null.

Distribution of \(T_N\) Under Sharp Null

\(i\) \(Y_i\) \(X_1\) \(X_2\) \(X_3\) \(X_4\) \(X_5\) \(X_6\) \(X_7\) \(X_8\) \(X_9\) \(X_{10}\)
1 1 1 1 1 1 1 1 0 0 0 0
2 1 1 1 1 0 0 0 1 1 1 0
3 0 1 0 0 1 1 0 1 1 0 1
4 1 0 1 0 1 0 1 1 0 1 1
5 1 0 0 1 0 1 1 0 1 1 1
\(T_N=\) 1/3 1/2 1/2 1/3 1/3 1/2 1/3 1/3 1/2 1/3

Under sharp null:

  • \(\Pr[T_N = 1/3] = 6/10\).

  • \(\Pr[T_N = 1/2] = 4/10\).

  • If \(T_N=1/3\), p-value\(=1\).

  • If \(T_N=1/2\), p-value\(=0.4\).

Applying Randomization Inference to PROGRESA

  • Review: Hypothesis Testing

  • Randomization Inference

  • Applying Randomization Inference to PROGRESA

Randomization Inference: PROGRESA

Complication:

  • Randomization done at village level, not child level.
    • Modify procedure to consider possible treatment as village level.

Randomization Inference: PROGRESA

Complication:

  • Randomization of 308 out of 491 villages.
    • Very large number of ways to assign treatment to 308 out of 491 villages. . .
choose(491,308)
[1] 2.452172e+139

Randomization Inference: PROGRESA

Complication:

  • Not feasible to enumerate all possible \({491 \choose 381}\) ways to assign treatment.
  • Solution: use stochastic approximation
    • Randomly draw \(K\) out of \({491 \choose 381}\) possible treatment assignments.
    • Base randomization inference on those \(K\) possible treatment assignments, justified as \(K \rightarrow \infty\).

Randomization Inference: PROGRESA

  • Randomization inference on PROGRESA Sample
    • “clustering” at village level
    • Using 10,000 from the \({491 \choose 381}\) possible ways to assign treatment to 381 out of 491 villages.

Randomization Inference: PROGRESA

  • Randomization inference on PROGRESA Sample
    • Outcome is school enrollment
    • Test statistic is \(T_N = \bar{Y}_1 - \bar{Y}_0\).

Randomization Inference: PROGRESA

  • Randomization inference on PROGRESA Sample

    • observed value of test statistic: 0.0392395
    • p-value: 6^{-4}
    • p-value \(< 0.05\), so reject at \(0.05\)-significance level.
                   term   estimate two_tailed_p_value
1 Custom Test Statistic 0.03923947              6e-04

Code:

library(ri2, quietly = TRUE)

#declaring randomization procedure
n.0<- length(unique(dfPost$sooloca))   
m.0<- length(unique(subset(dfPost,
          dfPost$treat==1)$sooloca))  
declaration_ra<-declare_ra(N=n.0,
          m=m.0,clusters=dfPost$sooloca)

# perform randomization inference 
ri_out <-  conduct_ri(
    test_function = function(d){
       abs(mean(d[d$treat==1,]$school) 
         -mean(d[d$treat==0,]$school))}, 
    declaration = declaration_ra, 
    assignment = "treat", 
    outcome = "school", 
    data = dfPost, 
    sims = 10000 
  )

# showing results
plot(ri_out)
summary(ri_out)
  • Using package ri2 to perform randomization inference
    (ri2 documentation).

Code:

library(ri2, quietly = TRUE)

#declaring randomization procedure
n.0<- length(unique(dfPost$sooloca))   
m.0<- length(unique(subset(dfPost,
          dfPost$treat==1)$sooloca))  
declaration_ra<-declare_ra(N=n.0,
          m=m.0,clusters=dfPost$sooloca)

# perform randomization inference 
ri_out <-  conduct_ri(
    test_function = function(d){
       abs(mean(d[d$treat==1,]$school) 
         -mean(d[d$treat==0,]$school))}, 
    declaration = declaration_ra, 
    assignment = "treat", 
    outcome = "school", 
    data = dfPost, 
    sims = 10000 
  )

# showing results
plot(ri_out)
summary(ri_out)
  • sooloca is village id.
  • n.0 is no. of villages.
  • m.0 is no. of treated villages.
  • using declare_ra from package ri2 to declare that randomization was at village level, with each of the ways to assign treatment to m.0 out of n.0 villages equally likely.

Code:

library(ri2, quietly = TRUE)

#declaring randomization procedure
n.0<- length(unique(dfPost$sooloca))   
m.0<- length(unique(subset(dfPost,
          dfPost$treat==1)$sooloca))  
declaration_ra<-declare_ra(N=n.0,
          m=m.0,clusters=dfPost$sooloca)

# perform randomization inference 
ri_out <-  conduct_ri(
    test_function = function(d){
       abs(mean(d[d$treat==1,]$school) 
         -mean(d[d$treat==0,]$school))}, 
    declaration = declaration_ra, 
    assignment = "treat", 
    outcome = "school", 
    data = dfPost, 
    sims = 10000 
  )

# showing results
plot(ri_out)
summary(ri_out)
  • using declare_ra from package ri2 to declare that randomization was at village level, with each of the ways to assign treatment to m.0 out of n.0 villages equally likely.

Code:

library(ri2, quietly = TRUE)

#declaring randomization procedure
n.0<- length(unique(dfPost$sooloca))   
m.0<- length(unique(subset(dfPost,
          dfPost$treat==1)$sooloca))  
declaration_ra<-declare_ra(N=n.0,
          m=m.0,clusters=dfPost$sooloca)

# perform randomization inference 
ri_out <-  conduct_ri(
    test_function = function(d){
       abs(mean(d[d$treat==1,]$school) 
         -mean(d[d$treat==0,]$school))}, 
    declaration = declaration_ra, 
    assignment = "treat", 
    outcome = "school", 
    data = dfPost, 
    sims = 10000 
  )

# showing results
plot(ri_out)
summary(ri_out)
  • using function conduct_ri from package ri2 to perform randomization inference.

  • test_function is defining test statistic, here \(T_N = |\bar{Y}_1-\bar{Y}_0|\).

Code:

library(ri2, quietly = TRUE)

#declaring randomization procedure
n.0<- length(unique(dfPost$sooloca))   
m.0<- length(unique(subset(dfPost,
          dfPost$treat==1)$sooloca))  
declaration_ra<-declare_ra(N=n.0,
          m=m.0,clusters=dfPost$sooloca)

# perform randomization inference 
ri_out <-  conduct_ri(
    test_function = function(d){
       abs(mean(d[d$treat==1,]$school) 
         -mean(d[d$treat==0,]$school))}, 
    declaration = declaration_ra, 
    assignment = "treat", 
    outcome = "school", 
    data = dfPost, 
    sims = 10000 
  )

# showing results
plot(ri_out)
summary(ri_out)
  • declaration is defining randomization procedure, defined above with declare_ra function.

Code:

library(ri2, quietly = TRUE)

#declaring randomization procedure
n.0<- length(unique(dfPost$sooloca))   
m.0<- length(unique(subset(dfPost,
          dfPost$treat==1)$sooloca))  
declaration_ra<-declare_ra(N=n.0,
          m=m.0,clusters=dfPost$sooloca)

# perform randomization inference 
ri_out <-  conduct_ri(
    test_function = function(d){
       abs(mean(d[d$treat==1,]$school) 
         -mean(d[d$treat==0,]$school))}, 
    declaration = declaration_ra, 
    assignment = "treat", 
    outcome = "school", 
    data = dfPost, 
    sims = 10000 
  )

# showing results
plot(ri_out)
summary(ri_out)
  • assignment is defining treatment variable.

  • outcome is defining outcome variable.

  • sims is number of random draws of possible treatment assignments.

Code:

library(ri2, quietly = TRUE)

#declaring randomization procedure
n.0<- length(unique(dfPost$sooloca))   
m.0<- length(unique(subset(dfPost,
          dfPost$treat==1)$sooloca))  
declaration_ra<-declare_ra(N=n.0,
          m=m.0,clusters=dfPost$sooloca)

# perform randomization inference 
ri_out <-  conduct_ri(
    test_function = function(d){
       abs(mean(d[d$treat==1,]$school) 
         -mean(d[d$treat==0,]$school))}, 
    declaration = declaration_ra, 
    assignment = "treat", 
    outcome = "school", 
    data = dfPost, 
    sims = 10000 
  )

# showing results
plot(ri_out)
summary(ri_out)

                   term   estimate two_tailed_p_value
1 Custom Test Statistic 0.03923947              6e-04

Randomization Inference: Subgroups

  • For any subgroup defined at baseline, we can test sharp null of no effect within that subgroup.
    • For example, for boys and girls separately
    • For example, by highest grade completed at baseline.

Randomization Inference: Boys

  • Randomization inference on Sample of Boys

  • p-value: 0.0124

  • p-value \(< 0.05\), so reject at \(0.05\)-significance level.

Randomization Inference: Girls

  • Randomization inference on Sample of Girls

  • p-value: 0.0012

  • p-value \(< 0.05\), so reject at \(0.05\)-significance level.

Randomization Inference by Grade

p-values by sex and grade
Grade p-value boys p-value girls
1 0.349 0.610
2 0.006 0.926
3 0.702 0.337
4 0.0003 0.022
5 0.294 0.077
6 0.002 0.008
7 0.011 0.0003
8 0.564 0.635
9 0.285 0.681
10 0.265 0.668

Randomization inference conditional on sex and grade of child:

  • Showing p-values for sharp null of no effect.

  • How would you summarize the evidence of whether PROGRESA has an effect, and on whom?

  • Possible concern: data mining.

    • Correct for multiple hypothesis testing using, e.g., Holm’s Procedure?